# Core data science ecosystem
library(tidyverse) # Data manipulation, summarization, ggplot2
library(janitor) # Data cleaning, variable standardization
library(lubridate) # Date and time parsing
library(readxl) # Importing Excel datasets
# Visualization and color management
library(colorspace) # Perceptually uniform palettes
library(colorblindcheck) # Accessibility diagnostics for color palettes
library(scales) # Formatting axes, labels, and scales
library(patchwork)
# Additional utilities
library(knitr) # Reporting utilities in Quarto/R MarkdownData analysis and visualization in R
Modulo 5 – Narrativa visual, EDA, colores y accesibilidad
Design a data analysis and viz epidemiology project
This document integrates the conceptual, methodological, and technical competencies developed throughout the previous modules of the course. Its objective is to guide the participant through a complete and reproducible workflow for epidemiological data analysis using the R ecosystem. The material covers the sequential stages of a rigorous analytic pipeline, including data importation, inspection of data structures, cleaning and harmonization, exploratory analysis, generation of summary statistics, identification of anomalies, and the construction of visualizations that support analytical reasoning and evidence-based interpretation. Emphasis is placed on the design of effective figures, the evaluation of alternative graphical encodings, and the application of accessibility principles such as perceptually uniform color palettes and color-blind-safe schemes.
The dataset employed consists of records of dengue cases obtained from real epidemiological surveillance sources and complemented with simulated data to illustrate methodological challenges frequently encountered in practice, such as inconsistent formats, missing values, atypical observations, and heterogeneous spatial or temporal resolution. The database includes climatic, demographic, and geographic variables commonly used in epidemiological analyses, enabling the participant to explore associations between disease incidence and environmental factors, assess temporal patterns, and develop visual narratives that communicate findings with clarity, rigor, and interpretability. Through these exercises, participants will consolidate their proficiency in R and strengthen their ability to construct reproducible analyses aligned with best practices in public health data science.
Load the libraries
The analytical workflow begins with the configuration of a coherent and reproducible working environment in R. This step ensures that all required packages are available, loaded, and functioning correctly before any data manipulation or visualization is performed. Establishing a well-defined environment is essential for ensuring consistency across analyses, facilitating collaboration, and preventing errors arising from version incompatibilities or missing dependencies.
In this module, we rely on a set of packages from the tidyverse ecosystem to manage data structures, perform transformations, and construct statistical graphics under a unified grammar. Additional libraries support tasks related to data cleaning, date handling, accessibility evaluation, and the implementation of perceptually uniform color scales—elements fundamental for producing rigorous and reproducible epidemiological visualizations.
Before loading the packages, it is important to verify that they are installed in your local R environment. If a package is missing, it can be installed using the following general syntax:
For example, to install the tidyverse:
Once all required packages are installed, they can be loaded as follows:
Once the environment has been initialized, participants should confirm that their working directory is correctly set and that the local structure of files is organized to support reproducibility. This includes ensuring that datasets, scripts, and outputs (figures, tables, and derived data) are stored in appropriately labeled folders. A disciplined setup at this stage provides a solid foundation for the subsequent stages of epidemiological data analysis.
A well-organized directory structure is fundamental for ensuring reproducibility, transparency, and efficient collaboration in data-driven epidemiological analyses. As a recommended practice, each project should be contained within a dedicated folder named according to the study or initiative—for example, ASIS. Within this main directory, it is advisable to create a set of subfolders that separate code, data, and analytical outputs. A common and effective structure includes:
code/— scripts used for importing, cleaning, transforming, and analyzing the data.data/raw/— original datasets stored exactly as received, without modifications.processed/— cleaned, harmonized, or transformed datasets generated during the analysis.
results/figures/— visualizations produced during the exploratory and inferential steps.tables/— summary statistics, model outputs, and tabulated results.
This hierarchical organization facilitates traceability, prevents accidental overwriting of original data, and supports a seamless workflow when producing reproducible reports with Quarto. It also enables clear version control and easier communication of analytic decisions during collaborative work or peer review.
Color Definitions and Principles for Consistent Use
A coherent and well-documented color strategy is essential for producing visualizations that are accurate, accessible, and interpretable in epidemiological contexts. Color must not serve merely as decoration; it functions as a perceptual encoding that guides attention, emphasizes contrasts, and supports analytical reasoning. For this reason, we adopt a structured set of palettes—sequential, diverging, and qualitative—each selected according to the type and scale of the variable being represented. All palettes included are perceptually uniform or color-blind-safe, ensuring accessibility and consistency throughout the analytical workflow.
Sequential Palettes
Sequential palettes are appropriate for variables measured on a continuous scale and where magnitude carries interpretive meaning, such as incidence rates, temperature, rainfall, or risk scores. These palettes encode increasing intensity through a smooth progression of luminance.
Recommended options:
pal_seq_viridis <- colorspace::sequential_hcl(7, palette = "Viridis")
pal_seq_blues <- colorspace::sequential_hcl(7, palette = "Blues")
pal_seq_magma <- colorspace::sequential_hcl(7, palette = "Inferno")Use when:
Representing epidemiological counts or rates.
Mapping gradients in heatmaps, temporal trends, or geospatial incidence surfaces.
Emphasizing low-to-high transitions without categorical breaks.
Diverging Palettes
Diverging palettes are intended for variables with a meaningful central reference point—for example, deviations from baseline, anomalies relative to average temperature, percent change, or differences before and after an intervention. These palettes emphasize both directionality and magnitude.
Recommended options:
pal_div_blue_red <- colorspace::diverging_hcl(9, palette = "Blue-Red 3")
pal_div_green_brown <- colorspace::diverging_hcl(9, palette = "Green-Brown")Use when:
Comparing increases vs. decreases in incidence.
Visualizing residuals, standardized differences, or temporal anomalies.
Communicating values on both sides of a reference threshold.
Qualitative Palettes
Qualitative palettes are suitable for categorical variables with no intrinsic order, such as regions, municipalities, vector species, or diagnostic categories. Colors must be distinguishable and carry equal perceptual weight.
Recommended options:
pal_qual_dark3 <- colorspace::qualitative_hcl(8, palette = "Dark 3")
pal_qual_set2 <- colorspace::qualitative_hcl(8, palette = "Set 2")
pal_qual_rainbow <- colorspace::qualitative_hcl(8, palette = "Harmonic")Use when:
Visualizing multiple administrative units.
Differentiating categories with similar epidemiological importance.
Avoiding perceptual hierarchies where no order is intended.
Special Palette for Sex-Stratified Comparisons
Epidemiological analyses frequently require comparison between men and women. The use of culturally ambiguous or stereotypical colors is discouraged; instead, we apply a palette that is perceptually balanced, color-blind-safe, and maintains clear contrast between groups.
Recommended options:
pal_sex <- c(
"Mujeres" = "#A9A9A9", # Blue (accessible, stable across palettes)
"Hombres" = "#708090" # Vermilion (high contrast, CVD-safe)
)Justification:
Both colors come from the scientific color palette of Okabe & Ito, designed for color-blind accessibility.
The pair exhibits high luminance contrast, ensuring readability in lines, points, and bars.
It avoids cultural pink/blue stereotypes while remaining intuitive in analytic presentations.
Color-Accessibility Diagnostics
Ensuring that visualizations are accessible to individuals with color-vision deficiencies (CVD) is a central requirement for scientific communication. Epidemiological analyses frequently inform decision-making among diverse audiences, including public health officials, clinicians, researchers, and community stakeholders. Consequently, all visual encodings must remain interpretable under common forms of color-blindness such as protanopia, deuteranopia, and tritanopia.
To support accessibility, we incorporate systematic diagnostic tools provided by the colorblindcheck package. This package simulates how plots appear under different CVD conditions and evaluates whether the chosen palette preserves sufficient perceptual contrast. These diagnostics should be applied before adopting any palette in recurrent analyses or final reporting.
The following example shows how to evaluate the palette defined for sex-stratified comparisons:
# Evaluate perceptual distinguishability of the sex palette
colorblindcheck::palette_check(pal_sex) name n tolerance ncp ndcp min_dist mean_dist max_dist
1 normal 2 16.74923 1 1 16.74923 16.74923 16.74923
2 deuteranopia 2 16.74923 1 1 17.19531 17.19531 17.19531
3 protanopia 2 16.74923 1 0 15.79820 15.79820 15.79820
4 tritanopia 2 16.74923 1 1 18.14135 18.14135 18.14135
This function provides information on contrast ratios and potential ambiguities between colors when viewed under different color-vision profiles. A “pass” indicates that distinctions remain clear across simulated conditions.
Check
Before finalizing a visualization, it is recommended to test the graph using simulated CVD transformations. The deutan(), rotan(), and tritan()function creates a panel showing how the plot appears under normal vision, protanopia, deuteranopia, and tritanopia, respectively.
Example:
set.seed(123)
# Crear secuencia de fechas (12 semanas)
fechas <- seq.Date(from = as.Date("2023-01-01"),
by = "week",
length.out = 12)
# Generar datos simulados de casos por sexo
datos <- tibble(
fecha = rep(fechas, times = 2),
sexo = rep(c("Mujeres", "Hombres"), each = length(fechas)),
casos = c(
# Mujeres: tendencia suave con fluctuación
round(runif(12, min = 20, max = 60) + seq(0, 11)*1.5),
# Hombres: valores ligeramente superiores y con más variabilidad
round(runif(12, min = 30, max = 75) + seq(0, 11)*2)
)
)
p <- datos |>
ggplot(aes(fecha, casos, color = sexo)) +
geom_line(linewidth = 1.2) +
scale_color_manual(values = pal_sex) +
labs(
title = "Incidencia de dengue",
x = "Semana epidemiológica",
y = "Número de casos") +
theme_minimal(base_size = 13)
# Generate a 4-panel simulation of color-vision variations
pal_sex_deutan <- deutan(pal_sex)
pal_sex_protan <- protan(pal_sex)
pal_sex_tritan <- tritan(pal_sex)
# show results
list(
original = pal_sex,
deutan = pal_sex_deutan,
protan = pal_sex_protan,
tritan = pal_sex_tritan
)
p1 <- p
p2 <- p +
scale_color_manual(values = pal_sex_deutan) +
labs(title = "Deuteranopia")
p3 <- p +
scale_color_manual(values = pal_sex_protan) +
labs(title = "Protanopia")
p4 <- p +
scale_color_manual(values = pal_sex_tritan) +
labs(title = "Tritanopia")# Mostrar comparación
(p1 | p2) /
(p3 | p4)The resulting panel helps determine whether line overlap or insufficient contrast could obscure epidemiological patterns for color-blind readers.
Sequential and diverging palettes also require diagnostic evaluation, especially when used in heatmaps or geospatial gradients where subtle hue variations carry meaningful information.
To ensure consistent accessibility across all visualizations:
Prefer palettes derived from perceptually uniform color spaces (e.g., CIELAB, HCL).
Avoid relying solely on color to encode meaning; incorporate line types, shapes, or annotations when appropriate.
Ensure minimum contrast ratios between adjacent colors or classes.
Test all finalized figures with deutan
(), rotan(), and tritan()prior to publication or dissemination.
By incorporating these diagnostics directly into the analytical workflow, the integrity and inclusiveness of data communication in epidemiological studies are strengthened, promoting clearer interpretation and equitable access to information.
Read the data
First, we need to get the data.
We can download all the datasets to work in this module
https://github.com/ae-tafur/data_visualization/tree/main/05_projects/excercises/data. Then, pick up any of the datasets availables and load it. Let the folder in downloads dir. Replace “user” and “your_data” by the name of your user and file.Notedata <- read.csv("~/Downloads/example_data.csv")Alternatively, we can just get the data directly from the repo, just by using a url. Here, we are gonna use this option.
url <- "https://raw.githubusercontent.com/ae-tafur/data_visualization/main"
terridata <- read_delim(
file.path(url,"02_data_exploration/excercises/data/TerriData20570.txt"),
delim = "|")Rows: 11140 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: "|"
chr (9): Departamento, Entidad, Dimensión, Subcategoría, Indicador, Dato Num...
dbl (4): Código Departamento, Código Entidad, Año, Mes
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now explore the data
terridata |>
slice(1:20) |>
kable(format = "html", table.attr = "class='table table-striped'")| Código Departamento | Departamento | Código Entidad | Entidad | Dimensión | Subcategoría | Indicador | Dato Numérico | Dato Cualitativo | Año | Mes | Fuente | Unidad de Medida |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Código DANE | NA | 20570 | 2000 | 0 | DANE | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Región | NA | Caribe | 2000 | 0 | DANE | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Subregión (SGR) | NA | Norte | 2000 | 0 | DNP | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2000 | 0 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2018 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2019 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2020 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2021 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2022 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2023 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2024 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría de ruralidad | NA | Rural disperso | 2000 | 0 | DNP | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Extensión | 859,00 | NA | 2017 | 3 | IGAC | Kilómetros cuadrados |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 27.007,00 | NA | 2018 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 28.298,00 | NA | 2019 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 29.017,00 | NA | 2020 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 29.706,00 | NA | 2021 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 30.292,00 | NA | 2022 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 30.844,00 | NA | 2023 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 31.317,00 | NA | 2024 | 12 | DANE | Personas |
As you can see Dato Numérico have a problem, because it was imported as character but it is a number. This is due to region change, in Colombia we use . as separators for miles and , as separators for decimals. So, let’s fix this.
terridata <- terridata |>
mutate(`Dato Numérico` = parse_number(`Dato Numérico`,
locale = locale(decimal_mark = ",",
grouping_mark = ".")))
terridata |>
slice(1:20) |>
kable(format = "html", table.attr = "class='table table-striped'")| Código Departamento | Departamento | Código Entidad | Entidad | Dimensión | Subcategoría | Indicador | Dato Numérico | Dato Cualitativo | Año | Mes | Fuente | Unidad de Medida |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Código DANE | NA | 20570 | 2000 | 0 | DANE | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Región | NA | Caribe | 2000 | 0 | DANE | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Subregión (SGR) | NA | Norte | 2000 | 0 | DNP | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2000 | 0 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2018 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2019 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2020 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2021 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2022 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2023 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría ley 617 de 2000 | NA | 6 | 2024 | 12 | Ley 617 de 2000 | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Categoría de ruralidad | NA | Rural disperso | 2000 | 0 | DNP | Texto |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Extensión | 859 | NA | 2017 | 3 | IGAC | Kilómetros cuadrados |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 27007 | NA | 2018 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 28298 | NA | 2019 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 29017 | NA | 2020 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 29706 | NA | 2021 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 30292 | NA | 2022 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 30844 | NA | 2023 | 12 | DANE | Personas |
| 20 | Cesar | 20570 | Pueblo Bello | Descripción general | Descripción general | Población total | 31317 | NA | 2024 | 12 | DANE | Personas |
Perfect, now we can work whit this data
Basic statistics
Check statistics
summary(terridata) Código Departamento Departamento Código Entidad Entidad
Min. :20 Length:11140 Min. :20570 Length:11140
1st Qu.:20 Class :character 1st Qu.:20570 Class :character
Median :20 Mode :character Median :20570 Mode :character
Mean :20 Mean :20570
3rd Qu.:20 3rd Qu.:20570
Max. :20 Max. :20570
Dimensión Subcategoría Indicador Dato Numérico
Length:11140 Length:11140 Length:11140 Min. :-1.039e+04
Class :character Class :character Class :character 1st Qu.: 3.000e+00
Mode :character Mode :character Mode :character Median : 4.500e+01
Mean : 1.536e+08
3rd Qu.: 9.090e+02
Max. : 7.435e+10
NA's :1906
Dato Cualitativo Año Mes Fuente
Length:11140 Min. :1985 Min. : 0.00 Length:11140
Class :character 1st Qu.:2013 1st Qu.:12.00 Class :character
Mode :character Median :2018 Median :12.00 Mode :character
Mean :2017 Mean :11.81
3rd Qu.:2021 3rd Qu.:12.00
Max. :2042 Max. :12.00
Unidad de Medida
Length:11140
Class :character
Mode :character
But this results are not useful in this context and data
Creating plots
One of the main use of Terridata is to get data about demographic data. Let’s get build a poblational pyramid
terridata |>
filter(Año == 2020 | Año == 2025 | Año == 2030) |>
filter(str_starts(Indicador, "Porcentaje de población de") ) |>
filter(str_starts(Subcategoría, "Población de") ) |>
filter(str_detect(Fuente, "Censo 2018")) |>
mutate(`Dato Numérico` = ifelse(str_detect(Indicador, "mujeres"),
-`Dato Numérico`,`Dato Numérico`),
Indicador = str_remove_all(
Indicador,
"[Porcentaje de población de mujeres de hombres de]"),
Subcategoría = str_remove_all(Subcategoría, "Población de ")) |>
ggplot(aes(x = `Dato Numérico`,
y = Indicador,
fill = str_to_title(Subcategoría),
color = as.character(Año),
group = as.character(Año))) +
geom_col(position = "identity") +
scale_fill_manual(values = pal_sex) +
scale_color_manual(values = c("#000000", "#9D02D7", "#F5275E")) +
scale_x_continuous(breaks = seq(-7,7,2), labels = abs) +
theme_minimal() +
theme(plot.caption = element_text(hjust = 0)) +
labs(fill = "Grupo",
color = "Año",
x = "Porcentaje de la población total",
y = "Quinquenios de edad")Another, plot can be the access to energy, water and sanitation.
terridata |>
filter(Indicador == "Cobertura de acueducto urbana (REC)" |
Indicador == "Cobertura de acueducto rural (REC)" |
Indicador == "Cobertura de alcantarillado urbana (REC)" |
Indicador == "Cobertura de alcantarillado rural (REC)" |
Indicador == "Cobertura de Energía Eléctrica Urbana (Censo)" |
Indicador == "Cobertura de Energía Eléctrica Rural (Censo)") |>
rowwise() |>
mutate(Grupo = ifelse(str_detect(str_to_lower(Indicador), "urbana"),
"Urbana", "Rural"),
Indicador = ifelse(str_detect(Indicador, "Energía"), "Energía",
ifelse(str_detect(Indicador, "acueducto"),
"Acueducto", "Alcantarillado"))) |>
ggplot(aes(y = `Dato Numérico`,
x = as.character(Año),
fill = Grupo)) +
geom_col(position = "dodge") +
facet_wrap(~Indicador, ncol = 1, scales = "free_x") +
scale_fill_manual(values = pal_qual_set2) +
theme_minimal() +
theme(legend.position = "bottom",
plot.caption = element_text(hjust = 0)) +
labs(fill = "",
x = "Año",
y = "Cobertura (%)")# Prepare breaks and labels safely
x_breaks <- terridata |>
filter(Indicador %in% c(
"Cobertura de acueducto urbana (REC)",
"Cobertura de acueducto rural (REC)",
"Cobertura de alcantarillado urbana (REC)",
"Cobertura de alcantarillado rural (REC)",
"Cobertura de Energía Eléctrica Urbana (Censo)",
"Cobertura de Energía Eléctrica Rural (Censo)")) |>
filter(Año >= 2018) |>
distinct(Año) |>
arrange(Año) |>
mutate(Año_num = as.numeric(as.factor(Año)))
breaks_vec <- x_breaks$Año_num
labels_vec <- x_breaks$Año
terridata |>
filter(Indicador %in% c(
"Cobertura de acueducto urbana (REC)",
"Cobertura de acueducto rural (REC)",
"Cobertura de alcantarillado urbana (REC)",
"Cobertura de alcantarillado rural (REC)",
"Cobertura de Energía Eléctrica Urbana (Censo)",
"Cobertura de Energía Eléctrica Rural (Censo)")) |>
filter(Año >= 2018) |>
mutate(Grupo = ifelse(str_detect(str_to_lower(Indicador), "urbana"),
"Urbana", "Rural"),
Indicador = case_when(str_detect(Indicador, "Energía") ~ "Energía",
str_detect(Indicador, "acueducto") ~ "Acueducto",
TRUE ~ "Alcantarillado"),
Año_num = as.numeric(as.factor(Año)),
offset = ifelse(Grupo == "Urbana", -0.15, 0.15),
x_pos = Año_num + offset) |>
ggplot(aes(x = x_pos,
y = `Dato Numérico`,
color = Grupo)) +
geom_segment(aes(x = x_pos,
xend = x_pos,
y = 0,
yend = `Dato Numérico`),
linewidth = 1.1,
alpha = 0.8) +
geom_point(size = 5) +
geom_text(aes(label = round(`Dato Numérico`, 1)),
vjust = 0.5,
hjust = 1.4,
size = 3.5,
fontface = "bold",
show.legend = FALSE) +
facet_wrap(~Indicador, ncol = 1, scales = "free_x") +
scale_color_manual(values = pal_qual_set2) +
scale_x_continuous(breaks = breaks_vec,
labels = labels_vec) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom",
plot.caption = element_text(hjust = 0),
strip.text = element_text(face = "bold"),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()) +
labs(color = "",
x = "Año",
title = "Cobertura por Servicios y Zona (%)")terridata |>
filter(str_detect(Subcategoría, "Acceso a la educación")) |>
filter(!str_detect(Indicador, "superior")) |>
filter(!str_ends(Indicador, "Total")) |>
filter(Año >= 2016) |>
rowwise() |>
mutate(Grupo = ifelse(str_detect(Indicador, "bruta"), "Bruta", "Neta"),
Indicador = str_trim(str_sub(Indicador,19))) |>
ggplot(aes(y = `Dato Numérico`,
x = Año,
group = Indicador,
color = Indicador)) +
geom_point(size = 3, shape = 1) +
geom_point(size = 1.5) +
geom_line(size = 0.5) +
facet_wrap(~Grupo, scales = "free_x", ncol = 1) +
scale_color_manual(values = pal_qual_dark3,
labels = function(x) stringr::str_to_sentence(x)) +
theme_minimal() +
theme(plot.caption = element_text(hjust = 0)) +
labs(color = "",
x = "",
y = "Cobertura")Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
terridata |>
filter(Año %in% c(2015, 2022)) |>
filter(str_detect(Subcategoría, "Acceso a la educación")) |>
filter(!str_detect(Indicador, "superior")) |>
mutate(Grupo = ifelse(str_detect(Indicador, "bruta"), "Bruta", "Neta"),
Indicador = str_trim(str_sub(Indicador, 19))) |>
ggplot(aes(x = Año,
y = `Dato Numérico`,
group = Indicador,
color = Indicador)) +
geom_line(linewidth = 1) +
geom_point(size = 3) +
facet_wrap(~Grupo) +
scale_color_manual(values = pal_qual_dark3,
labels = stringr::str_to_sentence) +
theme_minimal() +
labs(x = "",
y = "Cobertura",
color = "")terridata |>
filter(str_detect(Subcategoría, "Acceso a la educación")) |>
filter(!str_detect(Indicador, "superior")) |>
filter(!str_ends(Indicador, "Total")) |>
filter(Año >= 2016) |>
mutate(Grupo = ifelse(str_detect(Indicador, "bruta"), "Bruta", "Neta"),
Indicador = str_trim(str_sub(Indicador, 19))) |>
ggplot(aes(x = Año,
y = `Dato Numérico`,
fill = Indicador)) +
geom_area(position = "fill", alpha = 0.85) +
facet_wrap(~Grupo) +
scale_fill_manual(values = pal_qual_dark3,
labels = stringr::str_to_sentence) +
theme_minimal() +
labs(x = "",
y = "Proporción",
fill = "")Dengue simulated data
Let’s create a simulated data
set.seed(123)
# Parámetros generales
anios <- 2019:2023
semanas <- 1:52
# Base temporal
datos <- expand.grid(Año = anios,
Semana = semanas)
# Componente estacional (patrón típico de dengue)
datos <- datos |>
mutate(estacional = 20 + 15 * sin(2 * pi * Semana / 52),
ruido = rpois(n(), lambda = 5),
casos_base = round(estacional + ruido))
# Crear brote epidémico en 2023 (semanas 20–32)
datos <- datos |>
mutate(brote = ifelse(Año == 2023 & Semana %in% 20:32,
rpois(n(), lambda = 40),
0),
casos = casos_base + brote) |>
select(Año, Semana, casos)Now, plot a endemic channel
# Canal endémico a partir de años históricos
canal_endemico <- datos |>
filter(Año < 2023) |>
group_by(Semana) |>
summarise(p10 = quantile(casos, 0.10),
p25 = quantile(casos, 0.25),
p50 = quantile(casos, 0.50),
p75 = quantile(casos, 0.75),
p90 = quantile(casos, 0.90),
.groups = "drop")
casos_2023 <- datos |>
filter(Año == 2023)
# Datos del año de evaluación
casos_2023 <- datos |>
filter(Año == 2023)
canal_endemico |>
ggplot(aes(x = Semana)) +
# Zona Epidemia / Brote
geom_area(aes(y = p90, fill = "Brote"), alpha = 0.8) +
# Zona Alerta
geom_area(aes(y = p75, fill = "Alerta"), alpha = 0.8) +
# Zona Seguridad
geom_area(aes(y = p25, fill = "Seguridad"), alpha = 0.8) +
# Zona Éxito
geom_area(aes(y = p10, fill = "Éxito"), alpha = 0.8) +
# Percentil central (mediana)
geom_line(aes(y = p50), color = "black", linewidth = 1) +
# Casos observados (2023)
geom_point(data = casos_2023,
aes(x = Semana, y = casos),
color = "#D55E00",
size = 2.2) +
scale_fill_manual(values = c("Éxito" = "#E5F5E0",
"Seguridad" = "#A1D99B",
"Alerta" = "#FCBBA1",
"Brote" = "#CB181D")) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom",
panel.grid.minor = element_blank()) +
labs(
title = "Canal endémico de dengue (casos semanales)",
subtitle = "Construido con percentiles históricos (2019–2022)",
x = "Semana epidemiológica",
y = "Número de casos",
fill = "Zona epidemiológica",
caption = "Puntos: casos observados en 2023 (datos simulados)")